simon-5502-10-slides

Topics to be covered

  • What you will learn
    • Quiz and poll questions
    • Empiricism and its critics
    • Recidivism case study
    • Search history case study
    • License plate case study
    • Personalized medicine case study
    • What’s the cause and what’s the cure

Talk given to first year medical students

  • Topic also relevant to this class.
  • Only a few minor changes
    • Different format for the “programming” assignment

Who am I?

Steve Simon

  • PhD Statistics, 1982, U Iowa
  • Teach in Biomedical and Health Informatics
    • Previous jobs at CMH, CDC
  • Part-time independent statistical consultant (P.Mean Consulting)
  • Married to a Pediatric Cardiologist (retired)
  • Run 5K and 4 mile races

Obsessed with computers since 1972

Figure 1. Section on computer skills from my resume

Worked with health care applications since 1987

  • Recent positions
    • Centers for Disease Control and Prevention (1987-1996)
    • Children’s Mercy Hospital (1996-2008)
    • UMKC School of Medicine (2008 to present)
  • But…
    • I am not a doctor
    • Still confused about many things
      • Example: Difference between good and bad cholesterol.

Quiz questions (1/3)

Why does Joel Best call statistics a social construct?

  • Statistics are misquoted often on social media.
  • Statistics are selected, shaped, and presented by human beings.
  • Statistics are used to promote socialism.
  • Statistics are dehumanizing.

Quiz questions (2/3)

What is the main philosophical foundation of empiricism?

  • Everything can be reduced to a mathematical equation.
  • Experiments can reveal the realities of the world.
  • Some questions are impossible to answer.
  • We construct our own reality based on our own lived experiences

Quiz questions (3/3)

What is a major problem with data science?

  • Data scientists rely on large amounts of data with uneven quality.
  • Models developed by data scientists can lead to loss of privacy.
  • Prediction models are a black box that can hide discriminatory intent.
  • All of the above.

First poll question

Figure 2. Quote from “Peggy Sue Got Married”

Second poll question

Figure 3. Images of various computers

Are Statisticians Gods?

I’m helping someone who wants an alternative statistical analysis to the one used by the principal investigator. I’m happy to help and will offer advice about why my approach may be better, but I was warned that the PI considers the analysis chosen to be ordained by the “Statistical Gods” at her place of work.

Break #1

  • What you have learned
    • Quiz and poll questions
  • What’s coming next
    • Empiricism and its critics

Statistics are a social construct

Figure 4. Cover from Joel Best’s book

The arrogance of empiricism (1/2)

“I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be.” Lord Kelvin.

The arrogance of empiricism (2/2)

“No human investigation can be called real science if it cannot be demonstrated mathematically.” Leonardo da Vinci

Why empiricism fails (1/3)

“The government is very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn well pleases.” Sir Josiah Stamp

Why empricism fails (2/3)

Figure 5. Cover of Stephen Jay Gould’s book

Why empricism fails (3/3)

Figure 6. Frame from Yes, Prime Minister video clip

But numbers still have value. Example: Quality of Life

Figure 7. First few questions from SF-36 form

Rules for using Statistics as social constructs

  1. Understand the context in which the Statistic was generated.

  2. Identify possible biases

  3. Recognize limitations

  4. Beware of confirmation bias

  5. Avoid nihilistic thinking

Things have gotten worse, thanks to big data

  1. Large amounts of data of uneven quality

  2. Black box models

  3. Lack of accountability

  4. Scaling problems

  5. Loss of privacy

Break #2

  • What you have learned
    • Empiricism and its critics
  • What’s coming next
    • Recidivism case study

Case study evuations

  • Answer the following questions
    • Who is the villain?
    • Who is the victim?
    • How was the victim harmed?
    • What could have prevented this?
    • Did anything surprise you?
    • Did you disagree with anything in the article?
    • Is there a single quote from the article that summarizes it well?

Here are the articles for your review

Figure 8. Excerpts from three articles

Weapons of Math Destruction

Figure 9. Cover of Cathy O’Neill’s book

First villain

  • Walter Quijano
    • Provided testimony on recidivism rates at seven trials
    • Unfairly included race in his calculations and testimony
    • Six of seven convictions later overturned

Second villain

  • LSI-R questionnaire
    • Given to thousands of inmates
    • Classifies risk of recidivism
    • Does not explicitly ask about race
    • But does have “leading” questions

Victims

  • Inmates at parole hearings
  • Defendants at trial sentencing

How were the victims harmed

  • Disproportionate recidivism risks by race
    • Fewer paroles grants
    • Longer sentences
  • No avenue to appeal
    • Model presumed to be unbiased
    • Complexity prevents examination of bias
  • Scale issues
    • Walter Quijano harmed 7 defendants
    • LSI-R harmed thousands of defendants.

What could have prevented this

  • Insist on transparency
  • Test the model for bias
  • Build the model with better objective

Did anything surprise you?

  • Questionnaire includes questions that would be inadmissable if they were asked during a normal trial
    • When was the first time you were ever involved with the police?
    • Do any of your friends or relatives have a criminal record?

Did you disagree with anything in the article?

  • No

Is there are single quote that summarizes the article well

“The questionnaire includes circumstances of a criminal’s birth and upbringing, including his or her family, neighborhood, and friends. These details should not be relevant to a criminal case or to the sentencing.”

Break #3

  • What you have learned
    • Recidivism case study
  • What’s coming next
    • Search history case study

A Face Is Exposed for AOL Searcher No. 4417749

Figure 10. First page of newspaper article

Who is the villain

  • America Online
    • Collected data that could harm people
    • Released data without addressing privacy concerns
      • Searches for relatives
      • Neighborhood associations
      • Even self-searches
    • Search terms can be embarrassing
      • “Dog that urinates on everything”

Victims

  • Thelma Arnold (user 4417749)
  • Other AOL searchers
    • 3505202 “depression and medical leave”
    • 7268042 “fear that spouse contemplating cheating”

How were the victims harmed

  • Revelation of personal details
  • Loss of trust

What could have prevented this

  • Understanding that anonymization is difficult
  • Don’t collect/store sensitive information
    • “AOL keeps a record of each user’s search queries for one month, Mr. Weinstein said. This allows users to refer back to previous searches and is also used by AOL to improve the quality of its search technology.”

Did anything surprise you?

  • How much we reveal about ourselves when we search the Internet.

Did you disagree with anything in the article?

  • No

Is there are single quote that summarizes the article well

“Ms. Arnold says she loves online research, but the disclosure of her searches has left her disillusioned. In response, she plans to drop her AOL subscription. ‘We all have a right to privacy,’ she said. ‘Nobody should have found this all out.’”

Break #4

  • What you have learned
    • Search history case study
  • What’s coming next
    • License plate case study

How a ‘NULL’ License Plate Landed One Hacker in Ticket Hell

Figure 11. First page of newspaper article

Who is the villain?

  • Government bureaucracies that can’t work with one another
  • Programmers who don’t understand the difference between the string “NULL” and a null value.
  • Joseph Tartaro
    • Thought that a NULL license plate might cause the computer system to lose his traffic violoations

How was the victim harmed?

  • $12,049 worth of traffic fines
  • Lost time trying to resolve things

What could have prevented this?

  • Quality review of code
  • Flexibility to allow for common sense deviations

Did anything surprise you?

  • How hard it has been to fix things.

Did you disagree with anything in the article?

  • No

Is there a single quote from the article that summarizes it well?

  • “Prank or not, Tartaro was playing with fire by going with NULL in the first place. ‘He had it coming,’ says Christopher Null, a journalist who has written previously for WIRED about the challenges his last name presents. ‘All you ever get is errors and crashes and headaches.’ If anything, Null says, the problem has gotten worse over the years. ‘The “minimum viable product” concept has pushed a lot of bad code through that doesn’t go through with the proper level of testing,’ Null says, adding that anyone affected is inevitably an edge case, a relatively small problem not worth devoting a lot of resources to fix.”

Break #5

  • What you have learned
    • License plate case study
  • What’s coming next
    • Personalized medicine case study

How Bright Promise in Cancer Testing Fell Apart

Figure 12. First page of newspaper article

Questions to answer

  • Who is the villain?
  • Who is the victim?
  • How was the victim harmed?
  • What could have prevented this?
  • Did anything surprise you?
  • Did you disagree with anything in the article?
  • Is there a single quote from the article that summarizes it well?

Please work independently on this assignment.

Break #6

  • What you have learned
    • Personalized medicine case study
  • What’s coming next
    • What’s the cause and what’s the cure

Myths about big data

  1. Algorithms are objective

  2. If you have enough data, quality is no longer an issue

  3. We are getting better at this

No single cause of these problems

  1. Wrong data

  2. Wrong objective

  3. Wrong deployment

  4. Wrong team

What they say data science is

What data science really is

Why you are needed

  1. Too many geeks, not enough scientists
  2. More racial, gender diversity is needed

If you are interested, email me: simons@umkc.edu

You can find my talk here:

https://github.com/pmean/papers-and-presentations/blob/master/dark-side/2022-talk.pptx

Break #7

  • What you have learned
    • What’s the cause and what’s the cure
  • What’s coming next
    • Problems with large language models

Existential Comics, 1

Existential Comics, 2

Existential Comics, 3

Existential Comics, 4

Existential Comics, 5

Existential Comics, 6

Existential Comics, 7

What are some of the concerns with this new technology?

  • Copyright concerns
  • Style and manner
  • Halluncinations
  • Reinforcing stereotypes

A person note about intellectual property

  • I used copyrighted material in this talk
    • Falls under fair use provisions
      • Educational purposes
      • Limited amount of material
      • Does not compete on potential market
  • I always try to give credit to my sources

Style and manner, 1

Style and manner, 2

Style and manner, 3

Style and manner, 4

Plagiarism concerns

  • Will Generative AI pass “TurnItIn.com”?
  • Will you acknowledge your source?

Hallucinations

  • Training data contains accurate and inaccurate information
  • Combines data from multiple sources
  • Like autocomplete on steroids
  • Proposed solution: fact-checking using limited but reliable sources

Reinforcing prejudices and stereotypes, 1

Reinforcing prejudices and stereotypes, 2

Reinforcing prejudices and stereotypes, 3

Promising technologies

  • Github Copilot
  • NotebookLM
  • Gemini

Summary

  • What you have learned
    • Quiz and poll questions
    • Empiricism and its critics
    • Recidivism case study
    • Search history case study
    • License plate case study
    • Personalized medicine case study
    • What’s the cause and what’s the cure
    • Problems with large language models